Bidirectional Parsing Of Lexicalized Tree Adjoining Grammars
نویسندگان
چکیده
In this paper a bidirectional parser for Lexicalized Tree Adjoining Grammars will be presented. The algorithm takes advantage of a peculiar characteristic of Lexicalized TAGs, i.e. that each elementary tree is associated with a lexical item, called its anchor. The algorithm employs a mixed strategy: it works bottom-up from the lexical anchors and then expands (partial) analyses making top-down predictions. Even if such an algorithm does not improve tim worst-case time bounds of already known TAGs parsing methods, it could be relevant from the perspective of linguistic information processing, because it employs lexical information in a more direct way. 1 . I n t r o d u c t i o n Tree Adjoining Grammars (TAGs) are a formalism for expressing grammatical knowledge that extends the domain of locality of context-free grammars (CFGs). TAGs are tree rewriting systems specified by a finite set of elementary trees (for a detailed description of TAGs, see (Joshi, 1985)). TAGs can cope with various kinds of unbounded dependencies in a direct way because of their extended domain of locality; in fact, the elementary trees of TAGs are the appropriate domains for characterizing such dependencies. In (Kroch and Joshi, 1985) a detailed discussion of the linguistic relevance of TAGs can be found. Lexicalized Tree Adjoining Grammars (Schabes et al., 1988) are a refinement of TAGs such that each elementary tree is associated with a lexieal item, called the anchor of the tree. Therefore, Lexicalized TAGs conform to a common tendency in modem theories of grammar, namely the attempt to embed grammatical information within lexical items. Notably, the association between elementary trees and anchors improves also parsing performance, as will be discussed below. Various parsing algorithms for TAGs have been proposed in the literature: the worst-case time complexity varies from O(n 4 log n) (Harbusch, 1990) to O(n 6) (Vijay-Shanker and Joshi, 1985, Lang, 1990, Schabes, 1990) and O(n 9) (Schabes and Joshi, 1988). *Part of this work was done while Giorgio Satta was completing his Doctoral Dissertation at the University of Padova (Italy). We would like to thank Yves Schabes for his valuable comments. We would also like to thank Anne Abeill6. All errors are of c o u r s e o u r o w n . As for Lexicalized TAGs, in (Schabes et al., 1988) a two step algorithm has been presented: during the first step the trees corresponding to the input string are selected and in the second step the input string is parsed with respect to this set of trees. Another paper by Schabes and Joshi (1989) shows how parsing strategies can take advantage of lexicalization in order to improve parsers' performance. Two major advantages have been discussed in the cited work: grammar filtering (the parser can use only a subset of the entire grammar) and bottom-up information (further constraints are imposed on the way trees can be combined). Given these premises and starting from an already known method for bidirectional CF language recognition (Satta and Stock, 1989), it seems quite natural to propose an anchor-driven bidirectional parser for Lexicalized TAGs that tries to make more direct use of the information contained within the anchors. The algorithm employs a mixed strategy: it works bottom-up from the lexical anchors and then expands (partial) analyses making top-down predictions. 2 . O v e r v i e w o f t h e A l g o r i t h m The algorithm that will be presented is a recognizer for Tree Adjoining Languages: a parser can be obtained from such a recognizer by additional processing (see final section). As an introduction to the next section, an informal description of the studied algorithm is here presented. We assume the following definition of TAGs. Definition 1 A Tree Adjoining Grammar (TAG) is a 5-tuple G=(VN, Vy, S, l, A), where VN is a finite set of non-terminal symbols, Vy is a finite set of terminal symbols, Se VN is the start symbol, 1 and A are two finite sets of trees, called initial trees and auxiliary trees respectively. The trees in the set IuA are called elementary trees. We assume that the reader is familiar with the definitions of adjoining operation and foot node (see 0oshi, 1985)). The proposed algorithm is a tabular method that accepts a TAG G and a string w as input, and decides whether w e L (G) . This is done by recovering (partial) analyses for substrings of w and by combining them. More precisely, the algorithm factorizes analyses of derived trees by employing a specific structure called state. Each state retains a pointer to a node n in some tree ae l u A , along with two additional pointers (called Idol and rdot) to n itself or to
منابع مشابه
Head-Corner Parsing for TAG
This paper describes a bidirectional head-corner parser for (uniication-based versions of) Lexicalized Tree Adjoining Grammars.
متن کاملSome Experiments on Indicators of Parsing Complexity for Lexicalized Grammars
In this paper, we identify syntactic lexical ambiguity and sentence complexity as factors that contribute to parsing complexity in fully lexicalized grammar formalisms such as Lexicalized Tree Adjoining Grammars. We also report on experiments that explore the effects of these factors on parsing complexity. We discuss how these constraints can be exploited in improving efficiency of parsers for ...
متن کاملLexicalization and Grammar Development
In this paper we present a fully lexicalized grammar formalism as a particularly attractive framework for the specification of natural language grammars. We discuss in detail Feature-based, Lexicalized Tree Adjoining Grammars (FB-LTAGs), a representative of the class of lexicalized grammars. We illustrate the advantages of lexicalized grammars in various contexts of natural language processing,...
متن کاملLexicalization and Grammar Development Lexicalization and Grammar Development
In this paper we present a fully lexicalized grammar formalism as a particularly attractive framework for the specification of natural language grammars. We discuss in detail Feature-based, Lexicalized Tree Adjoining Grammars (FB-LTAGs), a representative of the class of lexicalized grammars. We illustrate the advantages of lexicalized grammars in various contexts of natural language processing,...
متن کاملPractical experiments in parsing using Tree Adjoining Grammars
We present an implementation of a chart-based head-corner parsing algorithm for lexicalized Tree Adjoining Grammars. We report on some practical experiments where we parse 2250 sentences from the Wall Street Journal using this parser. In these experiments the parser is run without any statistical pruning; it produces all valid parses for each sentence in the form of a shared derivation forest. ...
متن کاملResources for Lexicalized Tree Adjoining Grammars and XML Encoding: TagML
This work addresses both practical and theorical purposes for the encoding and the exploitation of linguistic resources for feature based Lexicalized Tree Adjoining grammars (LTAG). The main goals of these specifications are the following ones: 1. Define a recommendation by the way of an XML (Bray et al., 1998) DTD or schema (Fallside, 2000) for encoding LTAG resources in order to exchange gram...
متن کامل